AITopics

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.68)

Industry:

Information Technology > Security & Privacy (0.46)
Transportation > Ground > Road (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-8-2026, 12:45:43 GMT

3f9bf45ea04c98ad7cb857f951f499e2-Supplemental-Conference.pdf

dataset, target label, trojan, (16 more...)

Genre: Research Report > New Finding (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-24-2025, 02:22:11 GMT

Rethinking the Reverse-engineering of Trojan Triggers

Deep Neural Networks are vulnerable to Trojan (or backdoor) attacks. Reverse-engineering methods can reconstruct the trigger and thus identify affected models. Existing reverse-engineering methods only consider input space constraints, e.g., trigger size in the input space.Expressly, they assume the triggers are static patterns in the input space and fail to detect models with feature space triggers such as image style transformations. We observe that both input-space and feature-space Trojans are associated with feature space hyperplanes.Based on this observation, we design a novel reverse-engineering method that exploits the feature space constraint to reverse-engineer Trojan triggers. Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26% with the BA (benign accuracy) remaining nearly unchanged.

name change, rethinking, reverse-engineering, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Ahmari, Reza, Mohammadi, Ahmad, Hemmati, Vahid, Mynuddin, Mohammed, Mahmoud, Mahmoud Nabil, Kebria, Parham, Homaifar, Abdollah, Saif, Mehrdad

An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing

arXiv.org Artificial IntelligenceOct-27-2025

This study investigates the vulnerabilities of autonomous navigation and landing systems in Urban Air Mobility (UAM) vehicles. Specifically, it focuses on Trojan attacks that target deep learning models, such as Convolutional Neural Networks (CNNs). Trojan attacks work by embedding covert triggers within a model's training data. These triggers cause specific failures under certain conditions, while the model continues to perform normally in other situations. We assessed the vulnerability of Urban Autonomous Aerial Vehicles (UAAVs) using the DroNet framework. Our experiments showed a significant drop in accuracy, from 96.4% on clean data to 73.3% on data triggered by Trojan attacks. To conduct this study, we collected a custom dataset and trained models to simulate real-world conditions. We also developed an evaluation framework designed to identify Trojan-infected models. This work demonstrates the potential security risks posed by Trojan attacks and lays the groundwork for future research on enhancing the resilience of UAM systems.

artificial intelligence, deep learning, machine learning, (18 more...)

2510.20932

Country: North America > United States > Alabama (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Aerospace & Defense (0.98)
Government > Military (0.95)
Transportation > Air (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 12:52:53 GMT

Red Teaming Deep Neural Networks with Feature Synthesis Tools

We argue that this is due, in part, to a common feature of many interpretability methods: they analyze model behavior by using a particular dataset.

artificial intelligence, machine learning, visualization, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.68)

Industry:

Information Technology > Security & Privacy (0.46)
Transportation > Ground > Road (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-14-2025, 08:49:28 GMT

A Appendix

Roadmap: More details of Algorithm 1 is introduced in A.1. In A.5, we visualize our reverse-engineered Trojans. In this section, we discuss more details of our Reverse-engineering Algorithm (Algorithm 1). It is set to 400 in this paper. The batch size is set to 128 by default. In line 5, we calculate the loss value specified in Eq. 1, where The optimizer used is Adam [69].

dataset, target label, trojan, (16 more...)

Genre: Research Report > New Finding (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Bhasin, Vedant, Yudin, Matthew, Stefanescu, Razvan, Izmailov, Rauf

Trojan Detection Through Pattern Recognition for Large Language Models

arXiv.org Artificial IntelligenceJan-20-2025

Trojan backdoors can be injected into large language models at various stages, including pretraining, fine-tuning, and in-context learning, posing a significant threat to the model's alignment. Due to the nature of causal language modeling, detecting these triggers is challenging given the vast search space. In this study, we propose a multistage framework for detecting Trojan triggers in large language models consisting of token filtration, trigger identification, and trigger verification. We discuss existing trigger identification methods and propose two variants of a black-box trigger inversion method that rely on output logits, utilizing beam search and greedy decoding respectively. We show that the verification stage is critical in the process and propose semantic-preserving prompts and special perturbations to differentiate between actual Trojan triggers and other adversarial strings that display similar characteristics. The evaluation of our approach on the TrojAI and RLHF poisoned model datasets demonstrates promising results.

large language model, machine learning, natural language, (17 more...)

2501.11621

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation (0.67)
Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

arXiv.org Artificial IntelligenceNov-19-2024

Trojan Cleansing with Neural Collapse

Gu, Xihe, Fields, Greg, Jandali, Yaman, Javidi, Tara, Koushanfar, Farinaz

Trojan attacks are sophisticated training-time attacks on neural networks that embed backdoor triggers which force the network to produce a specific output on any input which includes the trigger. With the increasing relevance of deep networks which are too large to train with personal resources and which are trained on data too large to thoroughly audit, these training-time attacks pose a significant risk. In this work, we connect trojan attacks to Neural Collapse, a phenomenon wherein the final feature representations of over-parameterized neural networks converge to a simple geometric structure. We provide experimental evidence that trojan attacks disrupt this convergence for a variety of datasets and architectures. We then use this disruption to design a lightweight, broadly generalizable mechanism for cleansing trojan attacks from a wide variety of different network architectures and experimentally demonstrate its efficacy.

artificial intelligence, machine learning, neural collapse, (19 more...)

2411.12914

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Information Technology > Security & Privacy (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Neural Information Processing SystemsOct-10-2024, 18:44:24 GMT

Rethinking the Reverse-engineering of Trojan Triggers

rethinking, reverse-engineering, trojan trigger, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)

arXiv.org Artificial IntelligenceFeb-12-2024

Game of Trojans: Adaptive Adversaries Against Output-based Trojaned-Model Detectors

Sahabandu, Dinuka, Xu, Xiaojun, Rajabi, Arezoo, Niu, Luyao, Ramasubramanian, Bhaskar, Li, Bo, Poovendran, Radha

We propose and analyze an adaptive adversary that can retrain a Trojaned DNN and is also aware of SOTA output-based Trojaned model detectors. We show that such an adversary can ensure (1) high accuracy on both trigger-embedded and clean samples and (2) bypass detection. Our approach is based on an observation that the high dimensionality of the DNN parameters provides sufficient degrees of freedom to simultaneously achieve these objectives. We also enable SOTA detectors to be adaptive by allowing retraining to recalibrate their parameters, thus modeling a co-evolution of parameters of a Trojaned model and detectors. We then show that this co-evolution can be modeled as an iterative game, and prove that the resulting (optimal) solution of this interactive game leads to the adversary successfully achieving the above objectives. In addition, we provide a greedy algorithm for the adversary to select a minimum number of input samples for embedding triggers. We show that for cross-entropy or log-likelihood loss functions used by the DNNs, the greedy algorithm provides provable guarantees on the needed number of trigger-embedded input samples. Extensive experiments on four diverse datasets -- MNIST, CIFAR-10, CIFAR-100, and SpeechCommand -- reveal that the adversary effectively evades four SOTA output-based Trojaned model detectors: MNTD, NeuralCleanse, STRIP, and TABOR.

adversary, detection, detector, (15 more...)

2402.08695

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois (0.04)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.93)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)